Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Open FlyteFile from remote path #2991

Merged

Conversation

JiangJiaWei1103
Copy link
Contributor

@JiangJiaWei1103 JiangJiaWei1103 commented Dec 8, 2024

Tracking issue

Closes flyteorg/flyte#6090.

Why are the changes needed?

Directly opening a local or remote file with FlyteFile interface is quite useful. Also, this is commonly used with a context manager like:

with open(ff, "r") as f:
    # Read your file here
    # ...

Following discuss both the local and remote cases:

  1. Directly open FlyteFile from a local path
ff = FlyteFile(path=local_path)
with open(ff, "r") as f:
    # Read your local file here
    # ...

In this case, we don't need to download the file because it's already on the local file system. Hence, we just run a dummy downloading.

  1. Directly open FlyteFile from a remote path (our focus in this PR)
ff = FlyteFile(path=remote_path)
with open(ff, "r") as f:
    # Read your remote file here
    # ...

As for the remote file, we definitely need to download the file from remote to our local file system before the file is accessed.

With this fix, users can directly open either a local or remote file in a consistent manner.

What changes were proposed in this pull request?

  1. Make downloader a static method of FlyteFilePathTransformer.
    • Its behavior is logically related to FlyteFilePathTransformer but doesn't need to interact with class or instance data.
  2. Explicitly setup remote file source and local file destination in __fspath__ magic method of FlyteFile.
    • Support remote file downloading.

How was this patch tested?

This patch is first tested with the following code snippet:

import os

from flytekit import task
from flytekit.types.file import FlyteFile


# Remote path
remote_path = "s3://my-s3-bucket/test.json"


@task
def create_ff() -> FlyteFile:
    ff = FlyteFile(path=remot_path)
    with open(ff, "r") as f:
        content = f.read()
        print(content)

    return ff


if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner

    runner = CliRunner()
    path = os.path.realpath(__file__)
    result = runner.invoke(pyflyte.main, ["run", path, "create_ff"])
    print(result.output)

The result is shown as follows:

Screenshot 2024-12-08 at 9 58 23 PM

Setup process

git clone https://github.com/flyteorg/flytekit.git
gh pr checkout 2991
make setup && pip install -e .

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Summary by Bito

This PR enhances the FlyteFile interface by implementing a delayed downloading mechanism for remote files and improving remote file operations handling. The changes include a new static downloader method, improved path handling functionality, and better separation between file creation and reading operations. The implementation removes AWS endpoint configuration mocks and provides a consistent interface for both local and remote file scenarios.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
@JiangJiaWei1103
Copy link
Contributor Author

JiangJiaWei1103 commented Dec 8, 2024

Follow-ups

  1. Write integration test
    • Still pondering how to enable s3 interaction in integration test.
  2. Experiment on FlyteDirectory to see if there's similar issue

@Future-Outlier Future-Outlier self-assigned this Dec 9, 2024
@eapolinario
Copy link
Collaborator

eapolinario commented Dec 18, 2024

Thank you for your change, this indeed looks useful! I have one suggestion:

Write integration test

  • Still pondering how to enable s3 interaction in integration test.

You should be able to add FlyteFile specific integration tests that hit the minio container in the sandbox. Take a look at https://github.com/flyteorg/flytekit/blob/master/.github/workflows/pythonbuild.yml#L264-L266, if you specify those environment variables you should be able to write to an minio s3 bucket.

@JiangJiaWei1103
Copy link
Contributor Author

Hi @eapolinario,

Amazing, thanks very much for the guide. I'll add the corresponding integration test asap!

@JiangJiaWei1103
Copy link
Contributor Author

Hi @eapolinario,

This patch is now covered by the newly added integration test. If there's anything that can be improved, please let me know. Thanks a lot!

ff._remote_source = uri

return ff

@staticmethod
def downloader(remote_path: str, local_path: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the context that we pass to async_to_python_value?

Suggested change
def downloader(remote_path: str, local_path: str) -> None:
def downloader(ctx: FlyteContext, remote_path: str, local_path: str) -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

We could just reuse the passed context instead of getting it again from FlyteContextManager.

Comment on lines 330 to 337
if ctx.file_access.is_remote(self.path) and self._remote_source is None:
# Setup remote file source and local file destination
self._remote_source = self.path
local_path = ctx.file_access.get_random_local_path(self._remote_source)
self._downloader = lambda: FlyteFilePathTransformer.downloader(
remote_path=self._remote_source, local_path=local_path
)
self.path = local_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to move these code to __init__ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kevin,

After some more tests, I observe that setting up these paths within __init__ leads to an error. This use case is described as follows:

  1. Initialize a FlyteFile with a remote path
  2. Call FlyteFilePathTransformer.async_to_literal before FlyteFile is opened

The error shows that the local file destination to which we download the remote file hasn't been created because the we do delayed downloading until users call open() explicitly.

To be concrete, a non-existing source_path will be used here. Hence, a FileNotFoundError occurs.

If there's any suggestion, please let me know. Thanks a lot!

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
@JiangJiaWei1103
Copy link
Contributor Author

We clean the test logic in the last commit following the suggestion here.

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 28, 2024

Code Review Agent Run #82f00f

Actionable Suggestions - 5
  • tests/flytekit/integration/remote/test_remote.py - 2
  • flytekit/types/file/file.py - 2
  • tests/flytekit/integration/remote/workflows/basic/flytefile.py - 1
    • Consider adding file operation error handling · Line 15-18
Review Details
  • Files reviewed - 4 · Commit Range: d9af440..2cb0a2c
    • flytekit/types/file/file.py
    • tests/flytekit/integration/remote/test_remote.py
    • tests/flytekit/integration/remote/utils.py
    • tests/flytekit/integration/remote/workflows/basic/flytefile.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Dec 28, 2024

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Bug Fix - Remote File Access Enhancement

file.py - Added support for delayed downloading of remote files and improved path handling

test_remote.py - Added integration tests for remote file access functionality

utils.py - Added utility class for file transfer operations in tests

flytefile.py - Added workflow test cases for FlyteFile operations

Comment on lines +819 to +821
# Upload a file to minio s3 bucket
file_transfer = SimpleFileTransfer()
remote_file_path = file_transfer.upload_file(file_type="json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use context manager for resource cleanup

The test creates temporary files but relies on manual cleanup. Consider using a context manager or pytest.fixture to ensure cleanup happens even if the test fails

Code suggestion
Check the AI-generated fix before applying
Suggested change
# Upload a file to minio s3 bucket
file_transfer = SimpleFileTransfer()
remote_file_path = file_transfer.upload_file(file_type="json")
@pytest.fixture
def remote_test_file():
file_transfer = SimpleFileTransfer()
remote_file_path = file_transfer.upload_file(file_type="json")
yield remote_file_path
url = urlparse(remote_file_path)
bucket, key = url.netloc, url.path.lstrip("/")
file_transfer.delete_file(bucket=bucket, key=key)

Code Review Run #82f00f


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +826 to +827
execution = remote.wait(execution=execution, timeout=datetime.timedelta(minutes=5))
assert execution.closure.phase == WorkflowExecutionPhase.SUCCEEDED, f"Execution failed with phase: {execution.closure.phase}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider handling workflow execution failures

Consider adding error handling for the case where workflow execution fails. Currently, only checking for SUCCEEDED phase but not handling other phases like FAILED, ABORTED, etc.

Code suggestion
Check the AI-generated fix before applying
Suggested change
execution = remote.wait(execution=execution, timeout=datetime.timedelta(minutes=5))
assert execution.closure.phase == WorkflowExecutionPhase.SUCCEEDED, f"Execution failed with phase: {execution.closure.phase}"
execution = remote.wait(execution=execution, timeout=datetime.timedelta(minutes=5))
if execution.closure.phase == WorkflowExecutionPhase.FAILED:
raise RuntimeError(f"Workflow execution failed: {execution.closure.error}")
elif execution.closure.phase == WorkflowExecutionPhase.ABORTED:
raise RuntimeError("Workflow execution was aborted")
elif execution.closure.phase != WorkflowExecutionPhase.SUCCEEDED:
raise RuntimeError(f"Unexpected execution phase: {execution.closure.phase}")
assert execution.closure.phase == WorkflowExecutionPhase.SUCCEEDED

Code Review Run #82f00f


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

self._downloader()
self._downloaded = True
return self.path
return self._local_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding path validation check

Consider adding a check for self._local_path before returning it to ensure it's not None or empty.

Code suggestion
Check the AI-generated fix before applying
Suggested change
return self._local_path
if not self._local_path:
raise ValueError("Local path is not initialized")
if not os.path.exists(self._local_path):
raise FileNotFoundError(f"Local path {self._local_path} does not exist")

Code Review Run #82f00f


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +744 to +753
def downloader(
ctx: FlyteContext, remote_path: typing.Union[str, os.PathLike], local_path: typing.Union[str, os.PathLike]
) -> None:
"""
Download data from remote_path to local_path.

We design the downloader as a static method because its behavior is logically
related to this class but don't need to interact with class or instance data.
"""
ctx.file_access.get_data(remote_path, local_path, is_multipart=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for downloads

Consider adding error handling in the downloader method to handle potential file access or I/O errors that may occur during file download operations.

Code suggestion
Check the AI-generated fix before applying
 @@ -744,10 +744,15 @@
  def downloader(
          ctx: FlyteContext, remote_path: typing.Union[str, os.PathLike], local_path: typing.Union[str, os.PathLike]
      ) -> None:
 -        """
          Download data from remote_path to local_path.
 -        """
 -        ctx.file_access.get_data(remote_path, local_path, is_multipart=False)
 +        try:
 +            ctx.file_access.get_data(remote_path, local_path, is_multipart=False)
 +        except Exception as e:
 +            logger.error(f"Failed to download file from {remote_path} to {local_path}: {str(e)}")
 +            raise

Code Review Run #82f00f


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines 15 to 18
ff = FlyteFile(path=remote_file_path)
with open(ff, "r") as f:
content = f.read()
print(f"FILE CONTENT | {content}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding file operation error handling

Consider adding error handling when opening and reading the file. The current implementation may raise exceptions if the file cannot be opened or read.

Code suggestion
Check the AI-generated fix before applying
Suggested change
ff = FlyteFile(path=remote_file_path)
with open(ff, "r") as f:
content = f.read()
print(f"FILE CONTENT | {content}")
ff = FlyteFile(path=remote_file_path)
try:
with open(ff, "r") as f:
content = f.read()
print(f"FILE CONTENT | {content}")
except (IOError, OSError) as e:
print(f"Error reading file: {e}")
raise

Code Review Run #82f00f


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines 812 to 816
@mock.patch.dict(os.environ, {
"AWS_ENDPOINT_URL": "http://localhost:30002",
"AWS_ACCESS_KEY_ID": "minio",
"AWS_SECRET_ACCESS_KEY": "miniostorage"
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this after this PR.
#3001

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix it!

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @arbaobao
can you add these to tests?
It mocks to cases

  1. create and read file from s3 in same task pod
  2. create and read file in different task pod (create in 1st pod, then pass it to 2nd pod to read)
import os

from flytekit import task, workflow
from flytekit.types.file import FlyteFile


# Remote path
remote_path = "s3://my-s3-bucket/s3_flyte_dir/example.txt"


@task
def create_ff() -> FlyteFile:
    return FlyteFile(path=remote_path)

@task
def read_ff(ff: FlyteFile):
    with open(ff, "r") as f:
        content = f.read()
        print("=============================")
        print(content)
        print("=============================")
@task
def create_and_read_ff() -> FlyteFile:
    ff = FlyteFile(path=remote_path)
    with open(ff, "r") as f:
        content = f.read()
        print("=============================")
        print(content)
        print("=============================")
    return ff

@workflow
def wf():
    f1 = create_ff()
    read_ff(f1)
    f2 = create_and_read_ff()
    read_ff(f2)


if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner

    runner = CliRunner()
    path = os.path.realpath(__file__)
    result = runner.invoke(pyflyte.main, ["run", path, "wf"])
    print(result.output)

@JiangJiaWei1103
Copy link
Contributor Author

Nice! I'll add these test cases. Thanks for your suggestion!

Please refer to flyteorg#3001

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Create ff in one task pod and read it in another task pod.

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 2, 2025

Code Review Agent Run #75cc24

Actionable Suggestions - 1
  • tests/flytekit/integration/remote/workflows/basic/flytefile.py - 1
    • Consider consolidating duplicated file reading logic · Line 18-20
Review Details
  • Files reviewed - 2 · Commit Range: 2cb0a2c..3c02d19
    • tests/flytekit/integration/remote/test_remote.py
    • tests/flytekit/integration/remote/workflows/basic/flytefile.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Comment on lines +18 to +20
with open(ff, "r") as f:
content = f.read()
print(f"FILE CONTENT | {content}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider consolidating duplicated file reading logic

Consider consolidating the file reading logic in read_ff() and create_and_read_ff() to avoid code duplication. The file reading code block appears in both functions.

Code suggestion
Check the AI-generated fix before applying
 @@ -12,9 +12,13 @@
 +def _read_file_content(ff: FlyteFile) -> None:
 +    with open(ff, "r") as f:
 +        content = f.read()
 +        print(f"FILE CONTENT | {content}")
 +
  def read_ff(ff: FlyteFile) -> None:
      """Read input FlyteFile.
      This can be used in the case in which a FlyteFile is created
      in another task pod and read in this task pod.
      """
 -    with open(ff, "r") as f:
 -        content = f.read()
 -        print(f"FILE CONTENT | {content}")
 +    _read_file_content(ff)

 @@ -33,9 +33,7 @@
      """
      ff = FlyteFile(path=file_path)
 -    with open(ff, "r") as f:
 -        content = f.read()
 -        print(f"FILE CONTENT | {content}")
 +    _read_file_content(ff)

Code Review Run #75cc24


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Copy link
Member

@Future-Outlier Future-Outlier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing PR, learn a lot from you, let's make flyte better and better together.
really excited to collaborate with you.

@Future-Outlier Future-Outlier merged commit f3996f6 into flyteorg:master Jan 3, 2025
102 checks passed
@JiangJiaWei1103
Copy link
Contributor Author

Thanks man! There's always something to learn from you. Let's go!

granthamtaylor added a commit that referenced this pull request Jan 6, 2025
* Store protos in local cache (#3022)

* Store proto obj instead of model Literal in local cache

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Remove unused file

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Bump aiohttp from 3.9.5 to 3.10.11 (#3018)

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.5 to 3.10.11.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.5...v3.10.11)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix bug in FlyteDirectory.listdir on local files (#2926)

* Fix issue in FlyteDirectory.listdir

Fixes flyteorg/flyte#6005

Signed-off-by: Pim de Haan <pim@cusp.ai>

* Added test

Signed-off-by: Pim de Haan <pim@cusp.ai>

* Run make lint

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

---------

Signed-off-by: Pim de Haan <pim@cusp.ai>
Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>

* Fix unit tests in airflow plugin (#3024)

Signed-off-by: Kevin Su <pingsutw@apache.org>

* fix: Fix resource meta typos for async agent (#3023)

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* fix: format commands output (#3026)

* Fix pydantic basemodel default input (#3013)

* Fix pydantic default input

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add pydantic integration test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Use duck typing by Thomas's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* lint

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* [BUG] Open FlyteFile from remote path (#2991)

* fix: Open FlyteFile from remote path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Add integration test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Use ctx as param instead of recreation

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Clean test logic

1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Setup local path and downloader in constructor

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Move SimpleFileTransfer to an utility file

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant env var setup

Please refer to #3001

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* test: Add another ff use case

Create ff in one task pod and read it in another task pod.

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* vllm inference plugin (#2967)

* vllm inference plugin

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

* fixed default value

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

---------

Signed-off-by: Daniel Sola <daniel.sola@union.ai>

* Add poetry to image spec (#3025)

* Add poetry to image spec

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Add stricter check

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

---------

Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>

* [test] Add integration test for accessing sd sttr in dc (#2969)

* test: Add integration test for attr access of sd

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Correct file path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* test: Support interaction with minio s3 bucket

1. Upload a local parquet file to minio s3 bucket
2. Access StructuredDataset attr from a dataclass
3. Open StructuredDataset from a remote path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Delete an unmerged integration test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Try imagespec with commit sha of corresponding fix

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove default_factory and create sd dc from input uri

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Clean test logic

1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant minio env var setup and add test comments

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Support uploading tmp pqt file

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Udpate deprecated module

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant and unused imports

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Pim de Haan <pim@cusp.ai>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Daniel Sola <daniel.sola@union.ai>
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com>
Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pim de Haan <pimdehaan@gmail.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Co-authored-by: V <0426vincent@gmail.com>
Co-authored-by: Han-Ru Chen (Future-Outlier) <eric901201@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Daniel Sola <40698988+dansola@users.noreply.github.com>
shuyingliang pushed a commit to shuyingliang/flytekit that referenced this pull request Jan 11, 2025
* fix: Open FlyteFile from remote path

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Add integration test

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Use ctx as param instead of recreation

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Clean test logic

1. Remove redundant prints
2. Use `mock.patch.dict` to setup `os.environ` for the current test fn
    * Avoid contaminating other tests running in the same process

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Setup local path and downloader in constructor

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* refactor: Move SimpleFileTransfer to an utility file

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* Remove redundant env var setup

Please refer to flyteorg#3001

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

* test: Add another ff use case

Create ff in one task pod and read it in another task pod.

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>

---------

Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Signed-off-by: Shuying Liang <shuying.liang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] FlyteFile from a remote path should be read correctly in a task
5 participants